Practical Named Entity Tagging using Co-training
نویسندگان
چکیده
3] and [1] opened the possibility of using an unlabeled corpus through co-training, a semi-supervised learning algorithm, to classify named entities. Our approach to solve the problem of Korean named entity classification also adopted a co-training method called DL-CoTrain. However, we use only a part-of-speech tagger and a simple noun phrase chunker instead of a full parser to extract the contextual features of a named entity for robustness and practicality. We will discuss the linguistic features in Korean which are valuable for named entity classification and experimentally show how large a labeled corpus and which unlabeled corpus is necessary for the superior performance and portability of a named entity classifier. With only a quarter of the labeled corpus, our method can compete with its supervised counterpart.
منابع مشابه
Addressing Scalability Issues of Named Entity Recognition Using Multi-Class Support Vector Machines
This paper explores the scalability issues associated with solving the Named Entity Recognition (NER) problem using Support Vector Machines (SVM) and high-dimensional features. The performance results of a set of experiments conducted using binary and multi-class SVM with increasing training data sizes are examined. The NER domain chosen for these experiments is the biomedical publications doma...
متن کاملThe hunvec framework for NN-CRF-based sequential tagging
In this work we present the open source hunvec framework for sequential tagging, built upon Theano and Pylearn2. The underlying statistical model, which connects linear CRF-s with neural networks, was used by Collobert and co-workers, and several other researchers. For demonstrating the flexibility of our tool, we describe a set of experiments on part-of-speech and named-entityrecognition tasks...
متن کاملNamed Entity Recognition in Persian Text using Deep Learning
Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...
متن کاملA Bootstrapping Approach to Named Entity Classification Using Successive Learners
This paper presents a new bootstrapping approach to named entity (NE) classification. This approach only requires a few common noun/pronoun seeds that correspond to the concept for the target NE type, e.g. he/she/man/woman for PERSON NE. The entire bootstrapping procedure is implemented as training two successive learners: (i) a decision list is used to learn the parsing-based high precision NE...
متن کاملBiomedical Named Entity Recognition Using Support Vector Machines: Performance vs. Scalability Issues
This paper examines the performance and scalability of Named Entity Recognition (NER) using multi-class Support Vector Machines (SVM) and high-dimensional features. The NER domain chosen for these experiments is the biomedical publications domain, especially selected due to its importance and inherent challenges. We use a simple machine learning approach that eliminates prior language knowledge...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002